Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum
نویسندگان
چکیده
As described in this paper, we propose a sub-band speech synthesis approach to develop a high quality Text-to-Speech (TTS) system: a sample-based spectrum is used in the high-frequency band and spectrum generated by HMM-based TTS is used in the low-frequency band. Herein, sample-based spectrum means spectrum selected from a phoneme database such that it is the most similar to spectrum generated by HMM-based speech synthesis. A key idea is to compensate over-smoothing caused by statistical procedures by introducing a sample-based spectrum, especially in the high-frequency band. Listening test results show that the proposed method has better performance than HMM-based speech synthesis in terms of clarity. It is at the same level as HMM-based speech synthesis in terms of smoothness. In addition, preference test results among the proposed method, HMM-based speech synthesis, and waveform speech synthesis using 80 min speech data reveal that the proposed method is the most liked. index term: HMM-based speech synthesis, Sub-band, waveform-based speech synthesis
منابع مشابه
HMM-based speech synthesis using sub-band basis spectrum model
In this paper, we propose HMM-based text-to-speech (TTS) using sub-band basis spectrum model (SBM). SBM can represent vocal tract spectra and phase characteristics by a linear combination of sub-band basis vectors. Some reports suggest that analysis-synthesized speech based on SBM is close to natural speech and SBM can perform effectively in TTS. Therefore, the SBM framework is expected to have...
متن کاملSub-band basis spectrum model for pitch-synchronous log-spectrum and phase based on approximation of sparse coding
In this paper, we propose a sub-band basis spectrum model which is a new spectrum representation model based on a linear combination of sub-band basis vectors. We apply sparse coding to the pitch-synchronously analyzed log-spectra. Based on the approximation of the resulting basis, we obtain subband basis vectors with 1-cycle sinusoidal shapes that have mel-scale for lower frequencies and equal...
متن کاملRobust speech recognition features based on temporal trajectory filtering of frequency band spectrum
This paper presents the use of a variety of lters in the temporal trajectories of frequency band spectrum to extract speech recognition features for environmental robustness. Three kind of lters for emphasizing the statistically important parts of speech are proposed. First, a bank of RASTA-like band-pass lters to t the statistical peaks of modulation frequency band spectrum of speech are used....
متن کاملAutomatic detection of voice creak
The analysis of large spontaneous speech corpora reveals that creaky mode appears more frequently than expected, especially for young female speakers. Creaky mode usually creates fundamental frequency measurement errors and creaky voice segments must be often identified manually beforehand to avoid erroneous reading of F0 in large speech databases. Various approaches have been proposed to ident...
متن کاملA Robust Front-End Processor combining Mel Frequency Cepstral Coefficient and Sub-band Spectral Centroid Histogram methods for Automatic Speech Recognition
Environmental robustness is an important area of research in speech recognition. Mismatch between trained speech models and actual speech to be recognized is due to factors like background noise. It can cause severe degradation in the accuracy of recognizers which are based on commonly used features like mel-frequency cepstral co-efficient (MFCC) and linear predictive coding (LPC). It is well u...
متن کامل